Global Development: Intermediate Topics in Politics, Policy, and Data
PSCI 3200 - Spring 2024
Logistics
Assignments
Did you send me a quarto file? If not, please do
Announcements
Next week: RStudio and Quarto workshop with Jeremy
Agenda for today
The power of randomization
What is confounding?
How randomization addresses confounding
RCTs
Causality with observational data
Final Project
What are your options?
Possible data sources
Causality as Explanation
Last week, we discussed the fundamental problem of causal inference:
We can never observe what could have happened - or the counterfactual outcome
This prevents us from ever observing individual treatment effects … but when treatment assignment is independent of our outcomes….
We can estimate average causal effects
DAGS
One useful way to think about causality is using Directed Acyclical Graphs (DAGs)
We know causal inference requires assumptions, and DAGS are ways for us to visualize those assumptions
In a DAG, each node is a variable and the edge represents a causal relationship. For example “X causes Y”:
DAGS
Of course, we know often more than one variable can cause an outcome
X and Y are independent if X is “separated” from other variables that go to Y.
DAGS
What if X and Y are both caused by some other variable, U? Are X and Y independent?
Can we compare the treatment group of X with the control group of X?
In the wild
Which is it?
Selecting on the dependent variable
Dealing with confounders
Sometimes it’s easy to think of what variables could affect both our treatment and our outcome
If we feel theoretically confident that we can observe all variables that confound the relationship between X and Y, we can control for them and estimate causal effects
BIG BIG BIG assumption (called Conditional Independence Assumption)
We cannot do anything with confounders we cannot observe!
Year =c(0,1,2,3)Outcome =c(NA, 1.2, 2, NA, NA, 1.3, 1.7, NA)Treatment =c("Control", "Control","Control","Control", "Treatment", "Treatment", "Treatment", "Treatment")dat =data.frame(Year, Outcome, Treatment)dat %>%ggplot(aes(x = Year, y = Outcome, group = Treatment, color = Treatment)) +geom_line(aes(linetype=Treatment),size=2) +geom_point(size =6) +xlim(0,3) +scale_y_continuous(limits =c(1,2), breaks =seq(1, 2, by = .1)) +scale_linetype_manual(values=c("solid", "solid")) +scale_color_manual(values =c("red", "blue") ) +theme(legend.position =c(0.8, 0.2), text =element_text(size=20))
Randomization as a way to get independence
Independence is a crucial assumption!
One way to make it more convincing is to randomize treatment assignment.
If treatment assignment depends on luck, not X, then we have a good theoretical reason to assume X and Y are independent.
Example from a RCT: Project STAR
Q: What is the causal effect of class size on educational outcomes?
What are some potential pitfalls?
Class size and educational outcomes are probably confounded:
Parent’s wealth
Where people live
What else?
Example from a RCT: Project STAR
Q: What is the causal effect of class size on educational outcomes?
Hypothesis: Kids learn better in smaller classrooms
Research Design: Randomize the size of classrooms!
Data & Code
star <-read.csv("./code/STAR.csv")dim(star)
[1] 1274 4
head(star)
classtype reading math graduated
1 small 578 610 1
2 regular 612 612 1
3 regular 583 606 1
4 small 661 648 1
5 small 614 636 1
6 regular 610 603 0
table(star$classtype)
regular small
689 585
summary(star$math)
Min. 1st Qu. Median Mean 3rd Qu. Max.
515.0 604.0 631.0 631.6 659.0 774.0
Data & Code
## Two-way frequency tablestable(star$classtype, star$graduated)
0 1
regular 92 597
small 74 511
## Two-way tables of proportionsprop.table(table(star$classtype, star$graduated), 1)
0 1
regular 0.1335269 0.8664731
small 0.1264957 0.8735043
# summarysummary(star$math)
Min. 1st Qu. Median Mean 3rd Qu. Max.
515.0 604.0 631.0 631.6 659.0 774.0
Difference-in-Means
What is the average causal effect of class size on education outcomes?
How would you answer this question?
Remember, we have kids randomly assigned to small classrooms and SAT scores.
Difference-in-Means
# 1. Mean Math score for people assinged to small classroommath_treat <-mean(star$math[star$classtype=="small"]) # 2. Meam math score for people in regular classromsmath_control <-mean(star$math[star$classtype=="regular"])# 3. Mean reading for treatmentreading_treat <-mean(star$reading[star$classtype=="small"]) # 4. Reading controlreading_control <-mean(star$reading[star$classtype=="regular"])### difference-in-means estimators ####math_treat - math_control
[1] 5.989905
reading_treat - reading_control
[1] 7.210547
Parentheses: What happened to the spread?
The mean is a measure of the central tendency
But does it tell us anything about the spread of the causal effect?
What statistic could help us here?
Can we do observational causal research?
Causality hinges on independence between treatment and outcome
By randomizing treatment assignment, RCTs fabricate independence
But not everything can or ought to be randomized!
Cost constraints
Ethical concerns
Historical research
Is observational causal research possible?
Can we do observational causal research?
Observational causal work relies on finding and leveraging accidentally occurring random variation in treatment assignment
RQ: Does exposure to state-funded pro-Russia news make Ukrainians more pro-Russia?
Problem: People tend to not watch TV randomly!
Solution: Due to geography and topography, TV reception is as-if-random
Idea: Compare people who received Russian TV with their neighbors that could not watch it and see differences in electoral behavior and support for Russia.
Assumptions: ????
Wrapping up
Causality requires assumptions!
DAGs are good ways to clarify our assumptions
Some are easier to sell than others
Randomization results in comparable groups vs going to museum is independent of life expectancy
Whether our conclusions are causal or not depend on whether our assumptions hold
To a large degree, these assumptions refer to what we cannot see, and are un-testable!
Careful researchers do a good job of arguing why a setting is well-suited to answer causal questions.